TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-Based FPGAs
نویسندگان
چکیده
The emergence of high-bandwidth memory (HBM) brings new opportunities to boost the performance sorting acceleration on FPGAs, which was conventionally bounded by available off-chip bandwidth. However, it is nontrivial for designers fully utilize this immense First, existing sorter designs cannot be directly scaled at increasing rate bandwidth, as required on-chip resource usage grows a much faster and would bound in turn. Second, need an in-depth understanding HBM's characteristics effectively HBM To tackle these challenges, we present TopSort, novel two-phase solution optimized HBM-based FPGAs. In first phase, 16 merge trees work parallel 32 channels' second TopSort reuses logic from phase one form wider tree partially sorted results one. also adopts HBM-specific optimizations reduce overhead improve bandwidth utilization. can sort up 4 GB data using all channels, with overall 15.6 GB/s. 6.7× 2.7× than state-of-the-art CPU FPGA sorters.
منابع مشابه
A High Performance FPGA-Based Sorting Accelerator with a Data Compression Mechanism
Sorting is an extremely important computation kernel that has been accelerated in a lot of fields such as databases, image processing, and genome analysis. Given that advent of Internet of Things (IoT) era due to mobile technology progressions, the future needs a sorting method that is available on any environment, such as not only high performance systems like servers but also low computationa...
متن کاملHigh Performance Monte-Carlo Based Option Pricing on FPGAs
High performance computing is becoming increasingly important in the field of financial computing, as the complexity of financial models continues to increase. Many of these financial models do not have a practical close form solution in which case numerical methods are the only alternative. Monte-Carlo simulation is one of most commonly used numerical methods, in scientific computing in genera...
متن کاملA High Performance FPGA-Based Accelerator for BLAS Library Implementation
This paper describes the implementation and the performance analysis of a hardware accelerator for the BLAS library matrix multiplication operation. This accelerator is based on a dual-FPGA board and on an implementation BLAS software library making use of the FPGA-based hardware. In order to evaluate the performance of such a system, we implemented the matrix multiplication operation (BLAS “dg...
متن کاملOptimized Implementation of RNS FIR Filters Based on FPGAs
In this paper optimized Residue Number System (RNS) arithmetic blocks to better exploit some of the architectural characteristics of the last generation FPGAs are presented. The implementation of modulo m adders, modulo m constant and general multipliers, input and output converters are presented. These architectures are based on moduli sets chosen in order to optimally use the 6-input Look-Up ...
متن کاملHigh Performance Computing with FPGAs
Field-programmable gate arrays represent an army of logical units which can be organized in a highly parallel or pipelined fashion to implement an algorithm in hardware. The flexibility of this new medium creates new challenges to find the right processing paradigm which takes into account the natural constraints of FPGAs: clock frequency, memory footprint and communication bandwidth. In this p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Emerging Topics in Computing
سال: 2022
ISSN: ['2168-6750', '2376-4562']
DOI: https://doi.org/10.1109/tetc.2022.3228575